DOC: update the DataFrame.count docstring #20221

joders · 2018-03-10T19:17:07Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
###################### Docstring (pandas.DataFrame.count) ######################
################################################################################

Count non-NA cells for each column or row.

Return Series with number of non-NA observations over requested
axis. Works with non-floating point data as well (detects `NaN` and
`None`)

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    If equal 0 or 'index' counts are generated for each column.
    If equal 1 or 'columns' counts are generated for each row.
level : int or str, optional
    If the axis is a `MultiIndex` (hierarchical), count along a
    particular level, collapsing into a `DataFrame`.
    A `str` specifies the level name.
numeric_only : boolean, default False
    Include only `float`, `int` or `boolean` data.

Returns
-------
Series or DataFrame
    For each column/row the number of non-NA/null entries.
    If level is specified returns a `DataFrame`.

See Also
--------
Series.count: number of non-NA elements in a Series
DataFrame.shape: number of DataFrame rows and columns (including NA
    elements)
DataFrame.isnull: boolean same-sized DataFrame showing places of NA
    elements

Examples
--------
>>> df=pd.DataFrame({ "Person":["John","Myla",None],
...                   "Age":[24.,np.nan,21.],
...                   "Single":[False,True,True]     })
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2    None  21.0    True
>>> df.count()
Person    2
Age       2
Single    3
dtype: int64
>>> df.count(axis=1)
0    3
1    2
2    2
dtype: int64

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.count" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

TomAugspurger · 2018-03-10T19:41:53Z

pandas/core/frame.py

+
+        Return Series with number of non-NA observations over requested
+        axis. Works with non-floating point data as well (detects `NaN` and
+        `None`)


Could you also add NaT here (that's our missing value for datetime data)

TomAugspurger · 2018-03-10T19:42:28Z

pandas/core/frame.py

-        level : int or level name, default None
-            If the axis is a MultiIndex (hierarchical), count along a
-            particular level, collapsing into a DataFrame
+            If equal 0 or 'index' counts are generated for each column.


I think you can remove "equal" from this line and the next.

TomAugspurger · 2018-03-10T19:46:01Z

pandas/core/frame.py

+
+        Examples
+        --------
+        >>> df=pd.DataFrame({ "Person":["John","Myla",None],


Pep8 on this example. space around =, no space after {, space after :, space after ,.

Could you also add an example with level=? I think you could

Make the dataframe 2 items longer and repeat John an Myla.

Update the df output and df.count examples

show df.set_index(['Person', 'Single']).count(level='Person')

jreback · 2018-03-11T14:40:03Z

pandas/core/frame.py

+        Series.count: number of non-NA elements in a Series
+        DataFrame.shape: number of DataFrame rows and columns (including NA
+            elements)
+        DataFrame.isnull: boolean same-sized DataFrame showing places of NA


refer to isna instead

jreback · 2018-03-11T14:40:13Z

pandas/core/frame.py

+        2    None  21.0    True
+        3    John  33.0    True
+        4    Myla  26.0   False
+        >>> df.count()


blank line between cases

TomAugspurger

Can you paste the output of the doc validation script again?

TomAugspurger · 2018-03-11T16:32:41Z

pandas/core/frame.py

+
+        Return Series with number of non-NA observations over requested
+        axis. Works with non-floating point data as well (detects `None`,
+        `NaN` and `NaT`)


End with a .

TomAugspurger · 2018-03-11T16:32:53Z

pandas/core/frame.py

+            If 1 or 'columns' counts are generated for each **row**.
+        level : int or str, optional
+            If the axis is a `MultiIndex` (hierarchical), count along a
+            particular level, collapsing into a `DataFrame`.


backticks around the `level` parameter.

joders · 2018-03-11T17:12:22Z

################################################################################
###################### Docstring (pandas.DataFrame.count) ######################
################################################################################

Count non-NA cells for each column or row.

Return Series with number of non-NA observations over requested
axis. Works with non-floating point data as well (detects `None`,
`NaN` and `NaT`).

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    If 0 or 'index' counts are generated for each column.
    If 1 or 'columns' counts are generated for each **row**.
level : int or str, optional
    If the axis is a `MultiIndex` (hierarchical), count along a
    particular `level`, collapsing into a `DataFrame`.
    A `str` specifies the level name.
numeric_only : boolean, default False
    Include only `float`, `int` or `boolean` data.

Returns
-------
Series or DataFrame
    For each column/row the number of non-NA/null entries.
    If `level` is specified returns a `DataFrame`.

See Also
--------
Series.count: number of non-NA elements in a Series
DataFrame.shape: number of DataFrame rows and columns (including NA
    elements)
DataFrame.isna: boolean same-sized DataFrame showing places of NA
    elements

Examples
--------
Constructing DataFrame from a dictionary:

>>> df = pd.DataFrame({"Person":
...                    ["John", "Myla", None, "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]})
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2    None  21.0    True
3    John  33.0    True
4    Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    4
Age       4
Single    5
dtype: int64

Counts for each **row**:

>>> df.count(axis='columns')
0    3
1    2
2    2
3    3
4    3
dtype: int64

Counts for one level of a `MultiIndex`:

>>> df.set_index(["Person", "Single"]).count(level="Person")
        Age
Person
John      2
Myla      1

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.count" correct. :)

TomAugspurger · 2018-03-11T17:15:35Z

pandas/core/frame.py

-        axis. Works with non-floating point data as well (detects NaN and None)
+        Count non-NA cells for each column or row.
+
+        Return Series with number of non-NA observations over requested


One last change, maybe remove the first sentence since this can return a DataFrame with level.

I think just use the extended summary to say what counts as non-null data.

The values None, NaN, NaT, and optionally np.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

Do you mean the first sentence in the extended summary, i.e. :
"Return Series with number of non-NA observations over requested axis."

If I understand you right I would change the entire summary (i.e. short and extended summary) to look like the following:

Count non-NA cells for each column or row. The values None, NaN, NaT, and optionally np.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

Yeah. np.inf to `numpy.inf` and single backticks around pandas.options.mode.use_inf_as_na.

TomAugspurger · 2018-03-11T17:15:51Z

pandas/core/frame.py

+
+        >>> df = pd.DataFrame({"Person":
+        ...                    ["John", "Myla", None, "John", "Myla"],
+        ...                    "Age": [24., np.nan, 21., 33, 26],


PEP8: indendt one more space. smae with line below.

For me flake complains if I change that. on my system flake doesn't check the examples, so I copy it in the code:

df = pd.DataFrame({"Person": ["John", "Myla", None, "John", "Myla"], "Age": [24., np.nan, 21., 33, 26], "Single": [False, True, True, True, False]}) df

If I have it like it like this flake only complains about the pd not being defined:
pandas/core/frame.py:5672:14: F821 undefined name 'pd'

Sorry I misread.

joders · 2018-03-12T13:15:50Z

################################################################################
###################### Docstring (pandas.DataFrame.count) ######################
################################################################################

Count non-NA cells for each column or row.

The values `None`, `NaN`, `NaT`, and optionally `numpy.inf` (depending
on `pandas.options.mode.use_inf_as_na`) are considered NA.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    If 0 or 'index' counts are generated for each column.
    If 1 or 'columns' counts are generated for each **row**.
level : int or str, optional
    If the axis is a `MultiIndex` (hierarchical), count along a
    particular `level`, collapsing into a `DataFrame`.
    A `str` specifies the level name.
numeric_only : boolean, default False
    Include only `float`, `int` or `boolean` data.

Returns
-------
Series or DataFrame
    For each column/row the number of non-NA/null entries.
    If `level` is specified returns a `DataFrame`.

See Also
--------
Series.count: number of non-NA elements in a Series
DataFrame.shape: number of DataFrame rows and columns (including NA
    elements)
DataFrame.isna: boolean same-sized DataFrame showing places of NA
    elements

Examples
--------
Constructing DataFrame from a dictionary:

>>> df = pd.DataFrame({"Person":
...                    ["John", "Myla", None, "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]})
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2    None  21.0    True
3    John  33.0    True
4    Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    4
Age       4
Single    5
dtype: int64

Counts for each **row**:

>>> df.count(axis='columns')
0    3
1    2
2    2
3    3
4    3
dtype: int64

Counts for one level of a `MultiIndex`:

>>> df.set_index(["Person", "Single"]).count(level="Person")
        Age
Person
John      2
Myla      1

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.count" correct. :)

TomAugspurger · 2018-03-12T14:24:54Z

Thanks @joders!

joders · 2018-03-13T20:00:49Z

thanks for providing pandas

DataFrame.count docstring reworked, examples added

83a7c88

aterrel approved these changes Mar 10, 2018

View reviewed changes

TomAugspurger added the Docs label Mar 10, 2018

TomAugspurger reviewed Mar 10, 2018

View reviewed changes

review corrections

8d76f60

jreback requested changes Mar 11, 2018

View reviewed changes

review fixes + better description

fd11167

TomAugspurger reviewed Mar 11, 2018

View reviewed changes

review fixes, full stop and backticks

bbe96aa

TomAugspurger reviewed Mar 11, 2018

View reviewed changes

review fix: changed summary, added quoting

dbb84eb

TomAugspurger added this to the 0.23.0 milestone Mar 12, 2018

TomAugspurger merged commit 0596cb1 into pandas-dev:master Mar 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the DataFrame.count docstring #20221

DOC: update the DataFrame.count docstring #20221

joders commented Mar 10, 2018 •

edited

Loading

TomAugspurger Mar 10, 2018

TomAugspurger Mar 10, 2018

TomAugspurger Mar 10, 2018

TomAugspurger Mar 10, 2018

jreback Mar 11, 2018

jreback Mar 11, 2018

TomAugspurger left a comment

TomAugspurger Mar 11, 2018

TomAugspurger Mar 11, 2018

joders commented Mar 11, 2018 •

edited

Loading

TomAugspurger Mar 11, 2018

joders Mar 11, 2018

TomAugspurger Mar 11, 2018

TomAugspurger Mar 11, 2018

joders Mar 11, 2018 •

edited

Loading

TomAugspurger Mar 12, 2018

joders commented Mar 12, 2018

TomAugspurger commented Mar 12, 2018

joders commented Mar 13, 2018

DOC: update the DataFrame.count docstring #20221

DOC: update the DataFrame.count docstring #20221

Conversation

joders commented Mar 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joders commented Mar 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joders Mar 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joders commented Mar 12, 2018

TomAugspurger commented Mar 12, 2018

joders commented Mar 13, 2018

joders commented Mar 10, 2018 •

edited

Loading

joders commented Mar 11, 2018 •

edited

Loading

joders Mar 11, 2018 •

edited

Loading